850 research outputs found
Modular Networks: Learning to Decompose Neural Computation
Scaling model capacity has been vital in the success of deep learning. For a
typical network, necessary compute resources and training time grow
dramatically with model size. Conditional computation is a promising way to
increase the number of parameters with a relatively small increase in
resources. We propose a training algorithm that flexibly chooses neural modules
based on the data to be processed. Both the decomposition and modules are
learned end-to-end. In contrast to existing approaches, training does not rely
on regularization to enforce diversity in module use. We apply modular networks
both to image recognition and language modeling tasks, where we achieve
superior performance compared to several baselines. Introspection reveals that
modules specialize in interpretable contexts.Comment: NIPS 201
Transfer Learning for Speech Recognition on a Budget
End-to-end training of automated speech recognition (ASR) systems requires
massive data and compute resources. We explore transfer learning based on model
adaptation as an approach for training ASR models under constrained GPU memory,
throughput and training data. We conduct several systematic experiments
adapting a Wav2Letter convolutional neural network originally trained for
English ASR to the German language. We show that this technique allows faster
training on consumer-grade resources while requiring less training data in
order to achieve the same accuracy, thereby lowering the cost of training ASR
models in other languages. Model introspection revealed that small adaptations
to the network's weights were sufficient for good performance, especially for
inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL
The Benefits of Model-Based Generalization in Reinforcement Learning
Model-Based Reinforcement Learning (RL) is widely believed to have the
potential to improve sample efficiency by allowing an agent to synthesize large
amounts of imagined experience. Experience Replay (ER) can be considered a
simple kind of model, which has proved extremely effective at improving the
stability and efficiency of deep RL. In principle, a learned parametric model
could improve on ER by generalizing from real experience to augment the dataset
with additional plausible experience. However, owing to the many design choices
involved in empirically successful algorithms, it can be very hard to establish
where the benefits are actually coming from. Here, we provide theoretical and
empirical insight into when, and how, we can expect data generated by a learned
model to be useful. First, we provide a general theorem motivating how learning
a model as an intermediate step can narrow down the set of possible value
functions more than learning a value function directly from data using the
Bellman equation. Second, we provide an illustrative example showing
empirically how a similar effect occurs in a more concrete setting with neural
network function approximation. Finally, we provide extensive experiments
showing the benefit of model-based learning for online RL in environments with
combinatorial complexity, but factored structure that allows a learned model to
generalize. In these experiments, we take care to control for other factors in
order to isolate, insofar as possible, the benefit of using experience
generated by a learned model relative to ER alone
Présentation du premier numéro de Travail et apprentissages. Revue de didactique professionnelle
DĂ©cidĂ©ment, il y a du nouveau dans lâanalyse du travail ! Sous la plume de Pierre Roche, on a pu lire, dans cette mĂȘme rubrique du numĂ©ro 99 de Formation Emploi, la prĂ©sentation de lâouvrage de Dominique Lhuillier « Cliniques du Travail » (Roche, 2007). Aujourdâhui, il sâagit dâune nouvelle revue â « Travail et Apprentissages â Revue de Didactique Professionnelle » â dont le premier numĂ©ro est paru en fĂ©vrier 2008. PubliĂ©e avec le soutien de lâassociation « Recherches et pratiques en didactiq..
Causal diffusion and its backwards diffusion problem
This article starts over the backwards diffusion problem by replacing the
\emph{noncausal} diffusion equation, the direct problem, by the \emph{causal}
diffusion model developed in \cite{Kow11} for the case of constant diffusion
speed. For this purpose we derive an analytic representation of the Green
function of causal diffusion in the wave vector-time space for arbitrary (wave
vector) dimension . We prove that the respective backwards diffusion problem
is ill-posed, but not exponentially ill-posed, if the data acquisition time is
larger than a characteristic time period () for space dimension
(N=2). In contrast to the noncausal case, the inverse problem is
well-posed for N=1. Moreover, we perform a theoretical and numerical comparison
between causal and noncausal diffusion in the \emph{space-time domain} and the
\emph{wave vector-time domain}. The paper is concluded with numerical
simulations of the backwards diffusion problem via the Landweber method.Comment: In the replacement I have rewritten the abstract and the
introduction. Moreover, I have added Remark 1 and simplified a little bit the
proof of Theorem 4. The reference 25 is updated, since the paper is now
publishe
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
The Languini Kitchen serves as both a research collective and codebase
designed to empower researchers with limited computational resources to
contribute meaningfully to the field of language modelling. We introduce an
experimental protocol that enables model comparisons based on equivalent
compute, measured in accelerator hours. The number of tokens on which a model
is trained is defined by the model's throughput and the chosen compute class.
Notably, this approach avoids constraints on critical hyperparameters which
affect total parameters or floating-point operations. For evaluation, we
pre-process an existing large, diverse, and high-quality dataset of books that
surpasses existing academic benchmarks in quality, diversity, and document
length. On it, we compare methods based on their empirical scaling trends which
are estimated through experiments at various levels of compute. This work also
provides two baseline models: a feed-forward model derived from the GPT-2
architecture and a recurrent model in the form of a novel LSTM with ten-fold
throughput. While the GPT baseline achieves better perplexity throughout all
our levels of compute, our LSTM baseline exhibits a predictable and more
favourable scaling law. This is due to the improved throughput and the need for
fewer training tokens to achieve the same decrease in test perplexity.
Extrapolating the scaling laws leads of both models results in an intersection
at roughly 50,000 accelerator hours. We hope this work can serve as the
foundation for meaningful and reproducible language modelling research
A test of positive suggestions about side effects as a way of enhancing the analgesic response to NSAIDs
Side effects are frequent in pharmacological pain management, potentially preceding analgesia and limiting drug tolerability. Discussing side effects is part of informed consent, yet can favor nocebo effects. This study aimed to test whether a positive suggestion regarding side effects, which could act as reminders of the medication having been absorbed, might favor analgesia in a clinical interaction model. Sixty-six healthy males participated in a study âto validate pupillometry as an objective measure of analgesiaâ. Participants were unknowingly randomized double-blind to positive vs control information about side effects embedded in a video regarding the study drugs. Sequences of moderately painful heat stimuli applied before and after treatment with diclofenac and atropine served to evaluate analgesia. Atropine was deceptively presented as a co-analgesic, but used to induce side effects. Adverse events (AE) were collected with the General Assessment of Side Effects (GASE) questionnaire prior to the second induced pain sequence. Debriefing fully informed participants regarding the purpose of the study and showed them the two videos.The combination of medication led to significant analgesia, without a between-group difference. Positive information about side effects increased the attribution of AE to the treatment compared to the control information. The total GASE score was correlated with analgesia, i.e., the more AEs reported, the stronger the analgesia. Interestingly, there was a significant between-groups difference on this correlation: the GASE score and analgesia correlated only in the positive information group. This provides evidence for a selective link between AEs and pain relief in the group who received the suggestion that AEs could be taken as a sign âthat help was on the wayâ. During debriefing, 65% of participants said they would prefer to receive the positive message in a clinical context. Although the present results cannot be translated immediately to clinical pain conditions, they do indicate the importance of testing this type of modulation in a clinical context
- âŠ